Proverb Variation: Experiments on Automatic Detection in Brazilian Portuguese Texts
نویسندگان
چکیده
This paper describes a methodology for automatically identifying proverbs and their variants in running texts. This methodology is based on existing compilations of proverbs, by exploring the regular syntactic structures that most proverbs present and intersecting syntactic structure with the lexical units of the proverbs. From the syntactic regularities we divided the data into 13 different classes. Finite-state automata is used to represent the regular patterns found in the classes. The results showed a precision rate of 74.68% tested in Brazilian Portuguese journalistic corpus.
منابع مشابه
Portuguese Proverbs: Types and Variants
Drawing on the methodology and previous results of Rassi et al. (2014) on the automatic identification of Brazilian Portuguese proverbs, this paper reports on an extension of that experiment, but now focused on the identification of the European Portuguese proverbs and their variants. Based on a large collection of over 56 thousand Portuguese proverbs and their variants, a database of proverb t...
متن کاملAutomatic Detection of Proverbs and their Variants
This article presents the task of automatic detection of proverbs in Brazilian Portuguese, from the intersection of the regular syntactic structure of proverbs and their core elements. We created finite-state automata that enabled us to look for these word combinations in running texts. The rationale behind this method consists in the fact that although proverbs may have a normal sentence struc...
متن کامل‘Minor’ Languages, ‘Broken’ Translations: On Brazilian Reworkings of an Albanian Novel
This essay approaches the challenges of global translation in the 21st century from what might still be considered a somewhat uncommon example: a direct translation of Ismail Kadaré's 1978 novel Prill e thyër (Broken April) from the original Albanian into Brazilian Portuguese in 2001. Not only does it examine and compare lexical elements in the source and target texts and the usage of translato...
متن کاملFully Automatic Compilation of Portuguese-English and Portuguese-Spanish Parallel Corpora
This paper reports the fully automatic compilation of parallel corpora for Brazilian Portuguese. Scientific news texts available in Brazilian Portuguese, English and Spanish are automatically crawled from a multilingual Brazilian magazine. The texts are then automatically aligned at documentand sentence-level. The resulting corpora contain about 2,700 parallel documents totaling over 150,000 al...
متن کاملReview and Evaluation of DiZer - An Automatic Discourse Analyzer for Brazilian Portuguese
This paper presents the review and evaluation of DiZer – an automatic discourse analyzer for Brazilian Portuguese. Based on Rhetorical Structure Theory, DiZer is a symbolic analyzer that makes use of linguistic patterns learned from a corpus of scientific texts to identify and build the discourse structure of texts. DiZer evaluation shows satisfactory results for scientific texts. In order to t...
متن کامل